Option compatible reward inverse reinforcement learning

نویسندگان

چکیده

Reinforcement learning in complex environments is a challenging problem. In particular, the success of reinforcement algorithms depends on well-designed reward function. Inverse (IRL) solves problem recovering functions from expert demonstrations. this paper, we solve hierarchical inverse within options framework, which allows us to utilize intrinsic motivation A gradient method for parametrized used deduce defining equation Q-feature space, leads feature space. Using second-order optimality condition option parameters, an optimal function selected. Experimental results both discrete and continuous domains confirm that our recovered rewards provide solution IRL using temporal abstraction, turn are effective accelerating transfer tasks. We also show robust noises contained

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compatible Reward Inverse Reinforcement Learning

PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? ...

متن کامل

Active Learning for Reward Estimation in Inverse Reinforcement Learning

Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” ...

متن کامل

Inverse Reinforcement Learning with Locally Consistent Reward Functions

Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts’ behaviors. Solving our generali...

متن کامل

Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to guarantee in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We pr...

متن کامل

Repeated Inverse Reinforcement Learning

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formali...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition Letters

سال: 2022

ISSN: ['1872-7344', '0167-8655']

DOI: https://doi.org/10.1016/j.patrec.2022.01.016